fix duplication accuracy level calculations logic bug #649

Kuingsmile · 2026-01-08T08:03:30Z

In this PR I made two fixes to the duplicate calculation:

Fix Bloom membership logic: use AND across all hashes isDup &= (ret & byte) != 0; instead of overwriting isDup each iteration isDup = (ret & byte) != 0;. The old behavior effectively depended only on the last hash, which could lead to incorrect duplicate results. I have saw this error in my RNA-seq data analysis result.
Fix non–power-of-two masking at accuracy level 6: at level 6, mBufNum=6 and PRIME_ARRAY_LEN * mBufNum is not a power of two, so offset &= mask is not equivalent to modulo and causes biased indexing. I changed level 6 to use mBufNum=8 (mBufNum *= 4), making PRIME_ARRAY_LEN * mBufNum a power of two so the existing offset &= mask logic is correct.

fix duplication accuracy level calculations logic bug

2064440

sfchen merged commit fb04a1a into OpenGene:master Jan 14, 2026
1 of 2 checks passed

Provide feedback